Advanced Quantitative Methods

Week 5: More on Multilevel Models

Ozan Aksoy

University of Oxford

Aims for today

  1. Recap & build on 2-level model

    • Testing hypotheses
    • Explained variance
    • Cross-level interactions
  2. Longitudinal modelling with random effects

  3. Binary outcomes

  4. Three levels

  5. Cross-classified models

  6. Countries / two-step

Hierarchical Linear Model (SB p75)

Continuous outcome \(Y\) for individual \(i\) in group \(j\) \[Y_{ij} = \beta_{0j} + \beta_{1j}x_{ij} + ... + R_{ij}\] intercepts can vary by group:

\[\beta_{0j} = \gamma_{00} + U_{0j}\] coefficients (slopes) can vary by group:

\[\beta_{1j} = \gamma_{10} + U_{1j}\]

Level-one residuals \(R_{ij}\) have variance \(\sigma^2\)

Level-two residuals have variances & covariance:

\(var(U_{0j}) = \tau_{00} = \tau_0^2\) and \(var(U_{1j}) = \tau_{11} = \tau_1^2\)
\(cov(U_{0j},U{_1j}) = \tau_{01}\)

Recap - 2-level continuous outcome

What key things do you need to know?

  1. Why you use a multilevel model

  2. Interpret variance decomposition and ICC

  3. Explain rationale for and results of

    • random/varying intercepts
    • random/varying slopes
  4. Interpret correlation between slopes and intercepts

  5. Interpret coefficients (& se) of level 1 and 2 variables, and cross-level interaction

LEMMA - Centre for Multilevel Modelling

Highly recommended to do the whole online course

Step-by-step, very detailed, essential if you want to use multilevel modelling in your thesis

pre-dates tidyverse R but uses lme4 and it all works with tidyverse too

If you don’t do the whole course, at least watch these three videos (about 1.5 hours total):

Testing hypotheses

Hypotheses can concern:

  1. variance components

  2. varying intercepts or varying slopes

  3. constant slopes

  4. cross-level interactions

3 & 4 most common

Tests for fixed parameters

\[T(\gamma) = \frac{\hat{\gamma}_h}{s.e. (\hat{\gamma}_h)}\]

\(T\)-test or Wald test, we need \(T\) and degrees of freedom d.f.

\(\gamma\) at level 1: d.f. = (# level1 units) - (# level1 vars) - 1

\(\gamma\) at level 2: d.f. = (# level2 units) - (# level2 vars) - 1

Use T distribution table (google, textbook)

Our example: d.f. > 400, T-distribution ~ normal distribution

Tests for varying/random parameters

Random intercepts

  • Compare two models with similar X-vars: one with a constant intercept, one with varying intercepts

  • D = Deviance = -2 LL

    • D0 for Model 0 with m0 parameters; OLS model
    • D1 for Model 1 with m1 parameters; random intercept
  • D0 - D1 follows \(\chi^2\) distribution with d.f.= m1 - m0

Random slopes

  • Same procedure, but slight complication (S&B p 99)
  • We need the \(\bar{\chi}^2\) distribution and take parameters of the covariance into account for the d.f.
  • Also: ICC now depends on \(X_{ij}\) (and where x=0 is) - not straightforward anymore

Continue with last week’s data

## Load packages
library(tidyverse)    # as usual
library(broom)        # does not work with lme4 output
library(broom.mixed)  # to get tidy() to work with lme4
library(lme4)         # a popular package for multilevel models  
library(lmerTest)     # p-value estimations and other tests
library(sjPlot)       # nicer output
df <- read_csv("Data/example_pisa_2006.csv")
df$schoolid <- as.factor(df$schoolid)
glimpse(df)
Rows: 2,221
Columns: 13
$ schoolid      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ math          <dbl> 508.0931, 591.1279, 514.7141, 526.4761, 518.7646, 574.22…
$ read          <dbl> 513.9409, 635.0693, 522.9243, 554.3528, 588.7526, 586.52…
$ age           <dbl> 15.75, 16.00, 15.92, 15.58, 15.83, 15.33, 16.00, 16.08, …
$ female        <dbl> 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0,…
$ focc          <dbl> 29, 30, 64, 55, 33, 69, 67, 51, 70, 33, 23, 51, 54, 34, …
$ mocc          <dbl> 53, 44, 52, NA, 77, 46, 58, 43, 38, NA, 38, 53, 77, 43, …
$ foreignparent <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ misced        <chr> "isced 3b, c", "none", "isced 2", "isced 5b", "isced 5a,…
$ fisced        <chr> "isced 3b, c", "none", "isced 3b, c", "isced 5a, 6", "is…
$ books         <chr> "26-100 books", "201-500 books", "26-100 books", "101-20…
$ schsize       <dbl> 887, 887, 887, 887, 887, 887, 887, 887, 887, 887, 887, 8…
$ selective     <chr> "not considered", "not considered", "not considered", "n…

Continue with last week’s data

df$books <- as.factor(df$books)
df$books <- factor(df$books, levels = c("0-10 books", "11-25 books", "26-100 books", "101-200 books", "201-500 books", "more than 500 books"))
df$books <- as.numeric(df$books)
table(df$books)

  1   2   3   4   5   6 
243 332 680 404 362 172 

Remember to scale your variables

Making 0 a meaningful reference is a good idea; often necessary in multilevel models

## mean center variables
df$focc <- df$focc-mean(df$focc, na.rm = TRUE)
df$mocc <- df$mocc-mean(df$mocc, na.rm = TRUE)
df$age  <- df$age-mean(df$age, na.rm = TRUE)
df$books <- df$books-mean(df$books, na.rm = TRUE)

Empty model vs OLS model

mOLS <- lm(math ~ 1, data = df)
m0 <- lmer(math ~ 1 + (1 | schoolid), data = df)
library(lmtest)
lrtest(mOLS, m0)
Likelihood ratio test

Model 1: math ~ 1
Model 2: math ~ 1 + (1 | schoolid)
  #Df LogLik Df  Chisq Pr(>Chisq)    
1   2 -12855                         
2   3 -12622  1 465.48  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing nested models

m0 <- lmer(math ~ 1 + (1 | schoolid), data = df, REML = FALSE)
m1 <- lmer(math ~ 1 + female + focc + 
             (1 | schoolid), data = df, REML = FALSE)
lrtest(m0, m1)
Likelihood ratio test

Model 1: math ~ 1 + (1 | schoolid)
Model 2: math ~ 1 + female + focc + (1 | schoolid)
  #Df LogLik Df  Chisq Pr(>Chisq)    
1   3 -12625                         
2   5 -12549  2 151.17  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing nested models

m2 <- lmer(math ~ 1 + female + focc + 
             (1 + focc | schoolid), data = df, REML = FALSE)
lrtest(m1, m2)
Likelihood ratio test

Model 1: math ~ 1 + female + focc + (1 | schoolid)
Model 2: math ~ 1 + female + focc + (1 + focc | schoolid)
  #Df LogLik Df  Chisq Pr(>Chisq)
1   5 -12549                     
2   7 -12549  2 1.2301     0.5406

Explained Variance

  • Not as straightforward as (adj) R2 in OLS
  • S&B define \(R_1^2\)
  • Compare empty model and estimated model

\[R_1^2 = 1 - \frac{var(Y_{ij} - \sum_h{\gamma_hX_{hij}})}{var(Y_{ij})} = 1 - \frac{\text{model 1 }\sigma^2 + \tau_o^2}{\text{model 0 }\sigma^2 + \tau_o^2}\]

(SB7.3 p111-114)

  • Refers to explained variance at level 1
  • “proportional reduction in the value of \(\hat{\sigma}^2\) and \(\hat{\tau}_0^2\) due to including X-variables in the model”
  • For more than two levels add variance components of the additional levels

Level 2 variance can go up!

Level 2 variance increases when adding level 1 variable to the empty model

How can this happen?

Substantive example:

  • House price example, houses nested in neighbourhoods
  • m0= empty models; m1= m0 + individual house size
  • Expensive areas have relative small houses
  • We underestimate neighbourhood differences if we don’t take into account house size

See Holm, Jann & Karlson 2024 on why this is a more general problem, especially when low #i’s per j

They propose a new method to decompose unexplained variances (not for assignment and exam)

Cross-level interactions

  • Use them to test specific hypotheses: effect of level-1 variable \(X\)is different at different values of level-2 variable \(Z\)
  • Do we need a significant random slope for this? No and yes
  • Always explore variance in slopes
  • Add interaction between \(X\) and \(Z\), just like in other regression models
    • think about the interpretation of 0 score \(X\) and \(Z\); mean-center
  • Random part of the equation doesn’t change
  • \(T\)-test for the interaction
  • As in OLS:
    • make a graph of the interaction effects
    • significance vs size

Add cross-level interaction

df$select <- ifelse(df$selective == "not considered", 0, 1)
df <- df %>% mutate(selectXbooks = select * books)
m3 <- lmer(math ~ 1 + focc + select + age + 
             female + books + selectXbooks + 
             (1 + books | schoolid), data = df, REML = FALSE)
tidy(m3)
# A tibble: 11 × 8
   effect   group    term         estimate std.error statistic     df    p.value
   <chr>    <chr>    <chr>           <dbl>     <dbl>     <dbl>  <dbl>      <dbl>
 1 fixed    <NA>     (Intercept)   506.       4.18      121.    108.   1.87e-117
 2 fixed    <NA>     focc            0.674    0.0958      7.03 1952.   2.90e- 12
 3 fixed    <NA>     select         24.9      9.12        2.73   83.1  7.65e-  3
 4 fixed    <NA>     age            11.1      4.88        2.27 1907.   2.34e-  2
 5 fixed    <NA>     female        -16.7      2.89       -5.77 1931.   9.03e-  9
 6 fixed    <NA>     books          18.0      1.19       15.1  1290.   2.76e- 47
 7 fixed    <NA>     selectXbooks   -3.76     2.79       -1.35 1407.   1.77e-  1
 8 ran_pars schoolid sd__(Interc…   30.1     NA          NA      NA   NA        
 9 ran_pars schoolid cor__(Inter…    1.000   NA          NA      NA   NA        
10 ran_pars schoolid sd__books       0.914   NA          NA      NA   NA        
11 ran_pars Residual sd__Observa…   61.9     NA          NA      NA   NA        

Add cross-level interaction

tab_model(m3)
  math
Predictors Estimates CI p
(Intercept) 506.31 498.10 – 514.51 <0.001
focc 0.67 0.49 – 0.86 <0.001
select 24.92 7.04 – 42.79 0.006
age 11.07 1.50 – 20.65 0.023
female -16.70 -22.37 – -11.03 <0.001
books 17.98 15.64 – 20.32 <0.001
selectXbooks -3.76 -9.23 – 1.71 0.177
Random Effects
σ2 3829.61
τ00 schoolid 908.64
τ11 schoolid.books 0.84
ρ01 schoolid 1.00
ICC 0.19
N schoolid 87
Observations 1977
Marginal R2 / Conditional R2 0.195 / 0.350

Add cross-level interaction

m3a <- lmer(math ~ 1 + focc + select + age + 
             female + books +  (1 | schoolid), 
           data = df, REML = FALSE)
m3b <- lmer(math ~ 1 + focc + select + age + 
             female + books +  (1 + books | schoolid), 
           data = df, REML = FALSE)
m3c <- lmer(math ~ 1 + focc + select + age + 
             female + books + selectXbooks +  (1 + books | schoolid), 
           data = df, REML = FALSE)
anova(m3a, m3b, m3c)
Data: df
Models:
m3a: math ~ 1 + focc + select + age + female + books + (1 | schoolid)
m3b: math ~ 1 + focc + select + age + female + books + (1 + books | schoolid)
m3c: math ~ 1 + focc + select + age + female + books + selectXbooks + (1 + books | schoolid)
    npar   AIC   BIC logLik -2*log(L)  Chisq Df Pr(>Chisq)
m3a    8 22100 22145 -11042     22084                     
m3b   10 22103 22159 -11042     22083 0.7363  2     0.6920
m3c   11 22104 22165 -11041     22082 1.7933  1     0.1805

Longitudinal data

Longitudinal data

  • Longitudinal data: repeated measurements obtained from a unit (individuals, countries, …)

  • fixed occasions

  • varying occasions

  • balance is not required

  • Today: linear growth model (fixed and varying occasions)

Many extensions

-  Compound-symmetry (fixed occasions)

-  Fully multivariate (fixed occasions)

-  Panel models with autocorrelation and/or heteroskedasticity (fixed/varying occasions)

-  Latent growth models (fixed occasions)

Linear growth model: (MLM fitted to longitudinal data)

\[ \begin{aligned} Y_{ti} &= \pi_{0i} + \pi_{1i}\,T_{ti} + \pi_{2i}\,X_{ti} + \varepsilon_{ti} \\[6pt] \pi_{0i} &= \beta_{00} + \beta_{01}\,Z_i + u_{0i} \\[6pt] \pi_{1i} &= \beta_{10} + \beta_{11}\,Z_i + u_{1i} \\[6pt] \pi_{2i} &= \beta_{20} + \beta_{21}\,Z_i + u_{2i} \\[8pt] Y_{ti} &= \beta_{00} + \beta_{10}\,T_{ti} + \beta_{20}\,X_{ti} + \beta_{01}\,Z_i + \beta_{11}\,Z_i\,T_{ti} + \beta_{21}\,Z_i\,X_{ti} \\ &\quad + u_{0i} + u_{1i}\,T_{ti} + u_{2i}\,X_{ti} + \varepsilon_{ti} \end{aligned} \]

  • coefficients at the lowest (measurement/occasion) level: \(\pi\)

  • person level (second-level) coefficients: \(\beta\)

  • \(Y_{ti}\): response/outcome/dependent variable of individual \(i\) measured at time point \(t\)

  • \(T_{ti}\): time variable, that indicates the time point (0,1,2…)

  • \(X_{ti}\): time varying (measurement/occasion level) covariate

  • \(Z_i\): time invariant (second-level/person-level variables) covariates

Example

  • Example taken from Hox (2010)

  • 200 students

  • GPA: Grade Point Average for six successive semesters (\(t=0,..,5\))

  • job: time-varying (semester-level) covariate: hours of work per week

  • gender

  • High-school GPA

Three models to start with

```{r}
m1 <- lmer(gpa ~ + (1|sid), data=dlong, REML = FALSE)
m2 <- lmer(gpa ~ time + (1|sid), data=dlong, REML = FALSE)
m3 <- lmer(gpa ~ time + job + (1|sid), data=dlong, REML = FALSE)
screenreg(list(m1,m2,m3), digits = 3)
==============================================================
                      Model 1       Model 2       Model 3     
--------------------------------------------------------------
(Intercept)              2.865 ***     2.599 ***     2.970 ***
                        (0.019)       (0.022)       (0.044)   
time                                   0.106 ***     0.102 ***
                                      (0.004)       (0.004)   
job                                                 -0.171 ***
                                                    (0.018)   
--------------------------------------------------------------
AIC                    919.456       401.649       318.399    
BIC                    934.726       422.009       343.849    
Log Likelihood        -456.728      -196.825      -154.200    
Num. obs.             1200          1200          1200        
Num. groups: sid       200           200           200        
Var: sid (Intercept)     0.057         0.063         0.052    
Var: Residual            0.098         0.058         0.055    
==============================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
```

Intra-class correlation and explained variance

  • Different from normal MLM, in linear growth models use the model that controls for time as the benchmark (for technical reasons, see Hox 2010)

  • \(\rho = \frac{0.63}{0.063+0.058} = 0.52\)

  • proportion of variance in gpa at the occasion-level explained by job status:

  • \(R_1^2 = \frac{0.063 + 0.058 - (0.052+0.055)}{0.063 + 0.058} = 0.116\)

  • proportion of variance in gpa at the person-level explained by job status:

  • \(R_2^2 = \frac{0.063 + \frac{0.058}{6} - (0.052+\frac{0.055}{6})}{0.063 + \frac{0.058}{6}} = 0.158\)

Further predictors and testing for random slope of time

```{r}
m4 <- lmer(gpa ~ time + job + highgpa + female + (1|sid), data=dlong, REML = FALSE)
m5 <- lmer(gpa ~ time + job + highgpa + female + (1+time|sid), data=dlong, REML = FALSE)
screenreg(list(m4,m5), digits = 3)
=====================================================
                           Model 1       Model 2     
-----------------------------------------------------
(Intercept)                   2.641 ***     2.558 ***
                             (0.098)       (0.092)   
time                          0.102 ***     0.103 ***
                             (0.004)       (0.006)   
job                          -0.172 ***    -0.131 ***
                             (0.018)       (0.017)   
highgpa                       0.085 **      0.089 ***
                             (0.028)       (0.026)   
femaleTRUE                    0.147 ***     0.116 ***
                             (0.033)       (0.031)   
-----------------------------------------------------
AIC                         296.760       188.117    
Log Likelihood             -141.380       -85.059    
Var: sid (Intercept)          0.045         0.038    
Var: Residual                 0.055         0.042    
Var: sid time                               0.004    
Cov: sid (Intercept) time                  -0.002    
=====================================================
lrtest(m4,m5) # p/2 < 0.0001
```

Explaining random slope of time: cross-level interactions

```{r}
m5 <- lmer(gpa ~ time+female + job + highgpa + (1+time|sid), data=dlong, REML = FALSE)
m8 <- lmer(gpa ~ time*female + job + highgpa + (1+time|sid), data=dlong, REML = FALSE)

screenreg(list(m5,m8), digits = 6)
===========================================================
                           Model 1          Model 2        
-----------------------------------------------------------
(Intercept)                   2.557656 ***     2.581084 ***
                             (0.092100)       (0.092388)   
time                          0.103373 ***     0.087829 ***
                             (0.005586)       (0.007951)   
job                          -0.131119 ***    -0.132150 ***
                             (0.017264)       (0.017229)   
highgpa                       0.088541 ***     0.088504 ***
                             (0.026280)       (0.026271)   
femaleTRUE                    0.115670 ***     0.075506 *  
                             (0.031300)       (0.034652)   
time:femaleTRUE                                0.029564 ** 
                                              (0.010958)   
-----------------------------------------------------------
AIC                         188.117446       182.971012    
Log Likelihood              -85.058723       -81.485506    
Var: sid (Intercept)          0.038234         0.037811    
Var: sid time                 0.003837         0.003614    
Cov: sid (Intercept) time    -0.002491        -0.002197    
Var: Residual                 0.041542         0.041555    
===========================================================

(0.003837 - 0.003614)/0.003837 
[1] 0.05811832 # prop. explained variance in slope of time by gender
```

Time and female interaction

All predicted growth lines for girls and boys

Extracting random effects and their S.E.s

```{r}
fixef(m8)
  (Intercept)  time        femaleTRUE  highgpa      job         time:femaleTRUE 
   2.58108386  0.08782903  0.07550589  0.08850416  -0.13215018  0.02956441 
ranef(m8)
$sid
     (Intercept)          time
1   -0.229546024  4.698277e-02
2   -0.120238069  9.231293e-03
3   -0.009982086  3.576294e-02
...
coef(m8)
$sid
    (Intercept)          time femaleTRUE    highgpa        job time:femaleTRUE
1      2.351538  1.348118e-01 0.07550589 0.08850416 -0.1321502      0.02956441
2      2.460846  9.706033e-02 0.07550589 0.08850416 -0.1321502      0.02956441
3      2.571102  1.235920e-01 0.07550589 0.08850416 -0.1321502      0.02956441
....
se.ranef(m8)
$sid
    (Intercept)       time
1     0.1096017 0.03579896
2     0.1096017 0.03579896
...
# 95% CI of the effect of time on GPA for student==1
coef(m8)$sid[1,2] + c(-1.96,1.96)*se.ranef(m8)$sid[1,2]
[1] 0.06464583 0.20497777
---
```

Binary dependent variable

  • Logit multilevel models now available in main software packages (R, Stata, MLwiN)

  • Consider LPM instead

  • Most software also allows for multinomial, ordered logit models, and related distributions like Poisson

Binary 2-level model (group j, probability P)

  • \(P_j\) is the success rate at the group level

  • Groups drawn from a population of groups with randomly varying success rates

  • Simple model (SB 17.3): \(Y_{ij} = P_j + R_{ij}\)

  • \(R_{ij}\) is then either \(-P_i\) or \(1-P_j\)

  • Mean residual = 0, but: \(var(R_{ij}) = P_j(1-P_j)\)

  • Significant differences in \(P_j\)?

  • \(\chi_2\)-test: sum((observed-expected)2/observed)

  • Here: \(\chi_2(N_{gr}-1) = \sum_{j=1}^N n_j \frac{(\bar{Y}_{.j}-\hat{P}_.)^2}{\hat{P}_.(1-\hat{P}_.)}\) (SB 17.6, but see 17.7)

Binary outcome - empty model

\[f(P) = \gamma_0 + U_{0j}\]

if link function is logit, \(f(P_j)\) is the log odds for \(j\)

\[logit(P_j) = \gamma_0 + U_{0j}\]

\(U_{0j}\) is a random variable at the group level \(\sim{N}(0,\tau^2)\)

(SB 17.10)

Where is \(R_{ij}\)?

Not in the model because it simply follows from \(P_j\): \(var(R_{ij}) = P_j(1-P_j)\)

Logistic multilevel model - ICC

  • Two approaches; quite different results possible!

  • First approach based on SB3.11 and SB3.7 (section 17.2)

    • Involves more steps to calculate, not difficult but many steps
  • Second approach defines (residual) ICC as: \[\rho_i = \frac{\tau^2}{\tau^2+ \frac{\pi^2}{3}} = \frac{\tau^2}{\tau^2+ 3.29}\]

  • This second definition is often used in sociological papers
    • Easy to calculate
    • Compare across nested models (adding independent variables)

Null model with binary outcome

df$passed <-  ifelse(df$read >= 550, 1, 0)
ml0 <- glmer(passed ~ (1 | schoolid),  family = binomial("logit"), data = df)
summary(ml0)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: passed ~ (1 | schoolid)
   Data: df

      AIC       BIC    logLik -2*log(L)  df.resid 
   2401.6    2413.0   -1198.8    2397.6      2219 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.0641 -0.5916 -0.3753  0.6502  3.8228 

Random effects:
 Groups   Name        Variance Std.Dev.
 schoolid (Intercept) 1.64     1.28    
Number of obs: 2221, groups:  schoolid, 100

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.1713     0.1432  -8.181  2.8e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tidy(ml0)
# A tibble: 2 × 7
  effect   group    term            estimate std.error statistic   p.value
  <chr>    <chr>    <chr>              <dbl>     <dbl>     <dbl>     <dbl>
1 fixed    <NA>     (Intercept)        -1.17     0.143     -8.18  2.80e-16
2 ran_pars schoolid sd__(Intercept)     1.28    NA         NA    NA       

Null model with binary outcome

tab_model(ml0)
  passed
Predictors Odds Ratios CI p
(Intercept) 0.31 0.23 – 0.41 <0.001
Random Effects
σ2 3.29
τ00 schoolid 1.64
ICC 0.33
N schoolid 100
Observations 2221
Marginal R2 / Conditional R2 0.000 / 0.333

Note: marginal r2 is based on only fixed parameters, conditional r2 is fixed + random parameters.

Did we need the varying intercepts?

mlogit <- glm(passed ~ 1,  family = binomial("logit"), data = df)
lrtest(mlogit, ml0)
Likelihood ratio test

Model 1: passed ~ 1
Model 2: passed ~ (1 | schoolid)
  #Df  LogLik Df  Chisq Pr(>Chisq)    
1   1 -1361.5                         
2   2 -1198.8  1 325.41  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

More than two levels

From two hierarchical levels (i < j) to three (i < j < k)

\[Y_{ij} = \gamma_{00} + U_{0j} + R_{ij}\] \[Y_{ijk} = \gamma_{000} + V_{00k} + U_{0jk} + R_{ijk}\]

  • Variability on three levels; total variance= \(\sigma^2 + \tau^2 + \phi^2\)

  • Deviance test to see whether adding level matters

  • Random intercepts at level 2 & 3

  • Random slopes for variables at level 1 & 2

ICC in three levels

Total variance= \(\sigma^2 + \tau^2 + \phi^2\)

ICC level 3 = \(\frac{\phi^2}{\sigma^2 + \tau^2 + \phi^2}\) share of variance between groups \(k\)

ICC level 2+3 = \(\frac{\phi^2 + \tau^2}{\sigma^2 + \tau^2 + \phi^2}\) share of variance at \(j\) + \(k\)

Does \(j\) or \(k\) contribute more to variability? + ICC level 3 within 2+3 = \(\frac{\phi^2}{\tau^2 + \phi^2}\)

Effects in three level model

Constant/Fixed

  • Think very carefully about the interpretation of level 2 and 3 variables

  • Between and within effects at level 2 and 3

  • Mean-centred at what level?

Varying/Random

  • Random slopes for level 2 variables

  • Correlation between slopes and intercepts

Cross-level interactions

  • headache?

Cross-classified models

Cross-classified models

Cross-classified models

  • Observations nested in two groups, but the groups are not hierarchical

  • \(i\) is a member of \(j\) AND \(k\), where \(j\) is not nested in \(k\)

  • A third variance component is estimated: \(var(W) = \tau_w^2\)

\[ Y_{i(j,k)} = \gamma_0 + \sum_{h=1}^q\gamma_hx_{hij} + U_{0j} + \sum_{h=1}^pU_{hj}x_{hij} + W_{0k} + \sum_{h=1}^sW_{hk}x_{hij} + R_{ij} \] \(\sum_{h=1}^q\gamma_hx_{hij}\): fixed part

\(U_{0j} + \sum_{h=1}^pU_{hj}x_{hij}\): random part at schools (\(j\))

\(W_{0k} + \sum_{h=1}^sW_{hk}x_{hij}\): random part at neighbourhoods (\(k\))

  • \(W_{.k}\) and \(U_{.j}\) effects are independent

  • There are more types of ‘imperfect hierarchies’

  • We’ll skip the rest, meet with me if you need it for a specific project

Countries as level 2

  • Multilevel model or a two-step approach?
  • Run level 1 model for each country separately

  • Save coefficient or prediction or effect of interest

  • Plot the association of interest for the country data set

  • 30 countries as a minimum

Next week: Panel models

Start by reading the first two short chapters by Allison

Next read Huntington-Klein and Rüttenauer & Kapelle